Positive Data Clustering Using Finite Inverted Dirichlet Mixture Models
نویسنده
چکیده
Positive Data Clustering Using Finite Inverted Dirichlet Mixture Models Taoufik BDIRI In this thesis we present an unsupervised algorithm for learning finite mixture models from multivariate positive data. Indeed, this kind of data appears naturally in many applications, yet it has not been adequately addressed in the past. This mixture model is based on the inverted Dirichlet distribution, which offers a good representation and modeling of positive non gaussian data. The proposed approach for estimating the parameters of an inverted Dirichlet mixture is based on the maximum likelihood (ML) using Newton Raphson method. We also develop an approach, based on the Minimum Message Length (MML) criterion, to select the optimal number of clusters to represent the data using such a mixture. Experimental results are presented using artificial histograms and real data sets. The challenging problem of software modules classification is investigated within the proposed statistical framework, also.
منابع مشابه
Variational Learning for Finite Inverted Dirichlet Mixture Models and Its Applications
Variational Learning for Finite Inverted Dirichlet Mixture Models and Its Applications Parisa Tirdad Clustering is an important step in data mining, machine learning, computer vision and image processing. It is the process of assigning similar objects to the same subset. Among available clustering techniques, finite mixture models have been remarkably used, since they have the ability to consid...
متن کاملAn Infinite Mixture Model of Generalized Inverted Dirichlet Distributions for High-Dimensional Positive Data Modeling
We propose an infinite mixture model for the clustering of positive data. The proposed model is based on the generalized inverted Dirichlet distribution which has a more general covariance structure than the inverted Dirichlet that has been widely used recently in several machine learning and data mining applications. The proposed mixture is developed in an elegant way that allows simultaneous ...
متن کاملPositive Data Clustering based on Generalized Inverted Dirichlet Mixture Model
Positive Data Clustering based on Generalized Inverted Dirichlet Mixture Model Mohamed Al Mashrgy, Ph.D. Concordia University, 2015 Recent advances in processing and networking capabilities of computers have caused an accumulation of immense amounts of multimodal multimedia data (image, text, video). These data are generally presented as high-dimensional vectors of features. The availability of...
متن کاملAn Infinite Mixture of Inverted Dirichlet Distributions
In this paper we present an infinite mixture model based on inverted Dirichlet distributions. The proposed mixture is learned using a fully Bayesian approach and allows to overcome a challenging issue when dealing with data clustering namely the automatic selection of the number of clusters. We explore the performance of the proposed approach on the challenging problem of text categorization. T...
متن کاملCollapsed Variational Dirichlet Process Mixture Models
Nonparametric Bayesian mixture models, in particular Dirichlet process (DP) mixture models, have shown great promise for density estimation and data clustering. Given the size of today’s datasets, computational efficiency becomes an essential ingredient in the applicability of these techniques to real world data. We study and experimentally compare a number of variational Bayesian (VB) approxim...
متن کامل